Mining Data Streams with Skewed Distribution based on Ensemble Method

نویسنده

  • Yi Wang
چکیده

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well skewed (e.g., few positives but lots of negatives) and skewed distributions, which are typical in many data stream applications. In this paper, we propose an ensemble and cluster based sample method to deal with this situation. The study shows that this method has effective result on skewed data streams mining. Mining Data Streams with Skewed Distribution based on Ensemble Method

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of linear and non-linear estimation methods in highly-skewed gold distribution

The purpose of this work is to compare the linear and non-linear kriging methods in the mineral resource estimation of the Qolqoleh gold deposit in Saqqez, NW Iran. Considering the fact that the gold distribution is positively skewed and has a significant difference with a normal curve, a geostatistical estimation is complicated in these cases. Linear kriging, as a resource estimation method, c...

متن کامل

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well rather skewed (e.g., few positives but lots of negatives) and stochastic distributions, which are typical in many data stream applications. In this paper, we propose a new approach to mine data stre...

متن کامل

Min-wise independent sampling from skewed data streams

Min-wise independent hashing is a powerful sampling technique for estimating the similarity between sets. In particular, it has proved to be ubiquitous for mining data streams of large volume where the input sets are revealed in arbitrary order and the elements in a given set do not arrive consecutively. More precisely, for sets of elements E and attributes A the input is a stream of element-at...

متن کامل

Improved Counter Based Algorithms for Frequent Pairs Mining in Transactional Data Streams

A straightforward approach to frequent pairs mining in transactional streams is to generate all pairs occurring in transactions and apply a frequent items mining algorithm to the resulting stream. The well-known counter based algorithms Frequent and Space-Saving are known to achieve a very good approximation when the frequencies of the items in the stream adhere to a skewed distribution. Motiva...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJAPUC

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2012